Skip to main content

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a type of neural network designed for processing sequences by leveraging hidden states to capture temporal information. They are particularly well-suited for tasks like language modeling, where the goal is to predict the next token based on the historical sequence of previous tokens.

Basics of RNNs

  • Latent Variable Models: RNNs utilize latent variable models to approximate the probability of a token xtx_t given all previous tokens x1,,xt1x_1, \ldots, x_{t-1}. This is represented mathematically as:

    P(xtxt1,,x1)P(xtht1),P(x_t \mid x_{t-1}, \ldots, x_1) \approx P(x_t \mid h_{t-1}),

    where ht1h_{t-1} denotes the hidden state at time t1t-1.

  • Hidden State Calculation: The hidden state hth_t is updated at each timestep using the current input xtx_t and the previous hidden state ht1h_{t-1} via a function ff, as shown:

    ht=f(xt,ht1).h_t = f(x_t, h_{t-1}).

    This function, often nonlinear, allows the RNN to compactly represent the history of observed data up to the current timestep.

  • Difference from Hidden Layers: Hidden states in RNNs should not be confused with hidden layers in other types of neural networks. Hidden states serve as inputs to each step of the RNN, reflecting the sequence's memory up to that point.

Neural Networks without Hidden States

For a simpler neural network model like the Multi-Layer Perceptron (MLP) with a single hidden layer, the computation does not involve any temporal dynamics:

H=ϕ(XWxh+bh),\mathbf{H} = \phi(\mathbf{X} \mathbf{W}_{\textrm{xh}} + \mathbf{b}_\textrm{h}),

where ϕ\phi is an activation function, and Wxh\mathbf{W}_{\textrm{xh}}, bh\mathbf{b}_\textrm{h} are the weight and bias parameters respectively.

Recurrent Neural Networks with Hidden States

In contrast to the non-recurrent model, RNNs maintain a hidden state across timesteps, updating it recurrently using both the current input and the previous hidden state:

Ht=ϕ(XtWxh+Ht1Whh+bh),\mathbf{H}_t = \phi(\mathbf{X}_t \mathbf{W}_{\textrm{xh}} + \mathbf{H}_{t-1} \mathbf{W}_{\textrm{hh}} + \mathbf{b}_\textrm{h}),

This recurrent update mechanism allows RNNs to remember information across many timesteps, making them ideal for tasks like time series forecasting and language modeling.

RNN-Based Character-Level Language Models

An RNN can be used to model language at the character level, where the network predicts the next character based on the past sequence of characters. This approach involves:

  • Shifting the sequence to align inputs and labels for training (e.g., input: "machine", label: "achine").
  • Using softmax and cross-entropy loss to train the model on predicting the next character in the sequence.

Example:

RNN in Python

Below is a basic example using PyTorch:

import torch
import torch.nn as nn

class SimpleRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleRNN, self).__init__()
self.hidden_size = hidden_size
self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)

def forward(self, x):
out, _ = self.rnn(x)
out = self.fc(out[:, -1, :])
return out

# Example usage
rnn = SimpleRNN(input_size=10, hidden_size=20, output_size=1)
input = torch.randn(5, 10, 10) # (batch_size, sequence_length, input_size)
output = rnn(input)
print(output)

This Python code defines a simple RNN module using PyTorch's nn.RNN layer. It processes input sequences and returns output using a fully connected layer after the last sequence element has been processed.

Customize RNN in Pytorch

Here is a comprehensive example from the PyTorch tutorial on building a character-level RNN for classifying names into their language of origin. This example includes the process of preparing the data, building the RNN model, and the training loop.

import torch
import torch.nn as nn
import torch.nn.functional as F
import random
import time
import math

# Helper function to convert Unicode string to plain ASCII
def unicodeToAscii(s):
return ''.join(c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn')

# Read a file and split into lines
def readLines(filename):
lines = open(filename, encoding='utf-8').read().strip().split('\n')
return [unicodeToAscii(line) for line in lines]

# RNN model
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)

def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden

def initHidden(self):
return torch.zeros(1, self.hidden_size)

# Training the model
def train(category_tensor, line_tensor):
hidden = rnn.initHidden()
rnn.zero_grad()
for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
loss = criterion(output, category_tensor)
loss.backward()
for p in rnn.parameters():
p.data.add_(p.grad.data, alpha=-learning_rate)
return output, loss.item()

# Training loop
def timeSince(since):
now = time.time()
s = now - since
m = math.floor(s / 60)
s -= m * 60
return '%d m %d s' % (m, s)

n_iters = 100000
print_every = 5000
plot_every = 1000
all_losses = []
current_loss = 0
start = time.time()

for iter in range(1, n_iters + 1):
category, line, category_tensor, line_tensor = randomTrainingExample()
output, loss = train(category_tensor, line_tensor)
current_loss += loss

if iter % print_every == 0:
guess, guess_i = categoryFromOutput(output)
correct = '✓' if guess == category else '✗ (%s)' % category
print('%d %d%% (%s) %.4f %s / %s %s' % (
iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))

if iter % plot_every == 0:
all_losses.append(current_loss / plot_every)
current_loss = 0